Online Learning in Stochastic Games and Markov Decision Processes

نویسندگان

  • Jia Yuan Yu
  • Shie Mannor
چکیده

In their standard formulations, stochastic games and Markov decision processes assume a rational opponent or a stationary environment. Online learning algorithms can adapt to arbitrary opponents and non-stationary environments, but do not incorporate the dynamic structure of stochastic games or Markov decision processes. We survey recent approaches that apply online learning to dynamic environments where the assumptions of rationality and stationarity are invalid. The emphasis is on the connections and differences between different approaches arising from different modeling assumptions and performance baselines.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Utilizing Generalized Learning Automata for Finding Optimal Policies in MMDPs

Multi agent Markov decision processes (MMDPs), as the generalization of Markov decision processes to the multi agent case, have long been used for modeling multi agent system and are used as a suitable framework for Multi agent Reinforcement Learning. In this paper, a generalized learning automata based algorithm for finding optimal policies in MMDP is proposed. In the proposed algorithm, MMDP ...

متن کامل

Multiagent Reinforcement Learning in Stochastic Games

We adopt stochastic games as a general framework for dynamic noncooperative systems. This framework provides a way of describing the dynamic interactions of agents in terms of individuals' Markov decision processes. By studying this framework, we go beyond the common practice in the study of learning in games, which primarily focus on repeated games or extensive-form games. For stochastic games...

متن کامل

Learning in Average Reward Stochastic Games A Reinforcement Learning (Nash-R) Algorithm for Average Reward Irreducible Stochastic Games

A large class of sequential decision making problems under uncertainty with multiple competing decision makers can be modeled as stochastic games. It can be considered that the stochastic games are multiplayer extensions of Markov decision processes (MDPs). In this paper, we develop a reinforcement learning algorithm to obtain average reward equilibrium for irreducible stochastic games. In our ...

متن کامل

A Near-Optimal Poly-Time Algorithm for Learning a class of Stochastic Games

We present a new algorithm for polynomial time learning of near optimal behavior in stochastic games. This algorithm incorporates and integrates important recent results of Kearns and Singh [ 1998] in reinforcement learning and of Monderer and Tennenholtz [1997] in repeated games. In stochastic games we face an exploration vs. exploitation dilemma more complex than in Markov decision processes....

متن کامل

From Markov Chains to Stochastic Games

Markov chains1 and Markov decision processes (MDPs) are special cases of stochastic games. Markov chains describe the dynamics of the states of a stochastic game where each player has a single action in each state. Similarly, the dynamics of the states of a stochastic game form a Markov chain whenever the players’ strategies are stationary. Markov decision processes are stochastic games with a ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2008